自我训练在半监督学习中表现出巨大的潜力。它的核心思想是使用在标记数据上学习的模型来生成未标记样本的伪标签,然后自我教学。为了获得有效的监督,主动尝试通常会采用动量老师进行伪标签的预测,但要观察确认偏见问题,在这种情况下,错误的预测可能会提供错误的监督信号并在培训过程中积累。这种缺点的主要原因是,现行的自我训练框架充当以前的知识指导当前状态,因为老师仅与过去的学生更新。为了减轻这个问题,我们提出了一种新颖的自我训练策略,该策略使模型可以从未来学习。具体而言,在每个培训步骤中,我们都会首先优化学生(即,在不将其应用于模型权重的情况下缓存梯度),然后用虚拟未来的学生更新老师,最后要求老师为伪标记生产伪标签目前的学生作为指导。这样,我们设法提高了伪标签的质量,从而提高了性能。我们还通过深入(FST-D)和广泛(FST-W)窥视未来,开发了我们未来自我训练(FST)框架的两个变体。将无监督的域自适应语义分割和半监督语义分割的任务作为实例,我们在广泛的环境下实验表明了我们方法的有效性和优越性。代码将公开可用。
translated by 谷歌翻译
在这项工作中,我们探讨了用于语义分割知识蒸馏的数据增强。为了避免过度适合教师网络中的噪音,大量培训示例对于知识蒸馏至关重要。 Imagelevel论证技术(例如翻转,翻译或旋转)在先前的知识蒸馏框架中广泛使用。受到功能空间上语义方向的最新进展的启发,我们建议在功能空间中包括以进行有效蒸馏的功能。具体而言,给定语义方向,可以在功能空间中为学生获得无限数量的增强。此外,分析表明,可以通过最大程度地减少增强损失的上限来同时优化这些增强。基于观察结果,开发了一种用于语义分割的知识蒸馏的新算法。对四个语义分割基准测试的广泛实验表明,所提出的方法可以提高当前知识蒸馏方法的性能而没有任何明显的开销。代码可在以下网址获得:https://github.com/jianlong-yuan/fakd。
translated by 谷歌翻译
随着大数据时代的出现,数据质量问题变得越来越重要。在许多因素中,缺少价值的数据是一个主要问题,因此开发有效的插补模型是研究界的关键主题。最近,一个主要的研究方向是采用神经网络模型,例如自组织映射或自动编码器来填充缺失值。但是,这些经典方法几乎无法在数据属性之间同时发现相关特征和共同特征。特别是,对于经典的自动编码器来说,这是一个非常典型的问题,他们经常学习无效的恒定映射,从而极大地伤害了填充性能。为了解决上述问题,我们建议并开发基于功能融合增强自动编码器的缺失值填充模型。我们首先设计并集成到自动编码器中,一个隐藏的层,该层由脱落神经元和径向基函数神经元组成,该神经元可以增强学习相关特征和共同特征的能力。此外,我们基于动态聚类(MVDC)制定了缺失的值填充策略,该策略已纳入迭代优化过程。该设计可以增强多维功能融合能力,从而提高动态协作缺失填充性能。通过实验比较与许多缺失值填充方法的实验比较来验证我们的模型的有效性,这些方法在七个数据集上进行了测试,而缺失率不同。
translated by 谷歌翻译
多模式学习,尤其是大规模的多模式预训练,在过去的几年中已经迅速发展,并带来了人工智能(AI)的最大进步。尽管具有有效性,但了解多模式预训练模型的潜在机制仍然是一个巨大的挑战。揭示此类模型的解释性可能会使AI领域中新型学习范式的突破。为此,鉴于人脑的多模式性质,我们建议借助非侵入性脑成像技术(例如功能磁共振成像(fMRI))探索多模式学习模型的解释性。具体而言,我们首先提出了1500万个图像文本对预训练的新设计的多模式基础模型,该模型在各种认知下游任务中显示出强烈的多模式理解和概括能力。此外,从神经编码的角度来看(基于我们的基础模型),我们发现,与单峰相比,经过多模式训练的视觉和舌编码器都更像脑状。特别是,我们确定了许多大脑区域,其中多模式训练的编码器表现出更好的神经编码性能。这与现有有关探索大脑多感觉整合的研究的发现是一致的。因此,我们认为,多模式基础模型是神经科学家研究人脑中多模式信号处理机制的更合适的工具。我们的发现还证明了多模式基础模型作为理想的计算模拟器的潜力,以促进脑和大脑的AI研究。
translated by 谷歌翻译
尽管Sylvester方程在各种图形挖掘应用程序(例如半监督标签学习和网络对齐)上取得了成功,但仍存在一些限制。Sylvester方程无法建模非线性关系以及对不同任务进行调整的僵化性限制了其绩效。在本文中,我们提出了一个端到端的神经框架Symgnn,该框架由多网络神经聚合模块和先前的多网络协会结合学习模块组成。提出的框架继承了Sylvester方程的关键思想,同时将其推广以克服上述局限性。对现实世界数据集的经验评估表明,Symgnn总体的实例超过了几何矩阵完成任务中的基准,其低级别的实例化可以将记忆消耗降低16.98%\%。
translated by 谷歌翻译
网络嵌入的目的是学习将节点映射到欧几里德空间的函数有助于网络上的多个学习分析任务。然而,现实世界网络背后的嘈杂信息和过度装备问题对嵌入向量的质量产生负面影响。为了解决这些问题,研究人员利用对网络嵌入(Advtne)的对抗培训并实现最先进的表现。与主流方法引入网络结构或数据特征的扰动不同,Advtne直接erurbs Metturbs参数,这提供了了解后面的机制的新机会。在本文中,我们从优化的角度理论上解释了Advtne。考虑到网络的权力法属性和优化目标,我们分析了其优异成果的原因。根据上述分析,我们提出了一种新的激活,以提高Advtne的性能。我们对四个真实网络进行广泛的实验,以验证我们在节点分类和链路预测中的方法的有效性。结果表明,我们的方法优于最先进的基线方法。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
translated by 谷歌翻译
Differentiable Architecture Search (DARTS) has attracted considerable attention as a gradient-based Neural Architecture Search (NAS) method. Since the introduction of DARTS, there has been little work done on adapting the action space based on state-of-art architecture design principles for CNNs. In this work, we aim to address this gap by incrementally augmenting the DARTS search space with micro-design changes inspired by ConvNeXt and studying the trade-off between accuracy, evaluation layer count, and computational cost. To this end, we introduce the Pseudo-Inverted Bottleneck conv block intending to reduce the computational footprint of the inverted bottleneck block proposed in ConvNeXt. Our proposed architecture is much less sensitive to evaluation layer count and outperforms a DARTS network with similar size significantly, at layer counts as small as 2. Furthermore, with less layers, not only does it achieve higher accuracy with lower GMACs and parameter count, GradCAM comparisons show that our network is able to better detect distinctive features of target objects compared to DARTS.
translated by 谷歌翻译
We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.
translated by 谷歌翻译